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Excitonic models of light-harvesting complexes, where the vibrational degrees of freedom are 
treated as a bath, are commonly used to describe the motion of the electronic excitation through 
a molecule. Recent experiments point toward the possibility of memory effects in this process and 
require to consider time non-local propagation techniques. The hierarchical equations of motion 
, (HEOM) were proposed by Ishizaki and Fleming to describe the site-dependent reorganization dy- 

■ namics of protein environments (J. Chem. Phys., 130, p. 234111, 2009), which plays a significant 

^ ' role in photosynthetic electronic energy transfer. HEOM are often used as a reference for other 

1*0. approximate methods, but have been implemented only for small systems due to their adverse com- 

putational scaling with the system size. Here, we show that HEOM are also solvable for larger 
systems, since the underlying algorithm is ideally suited for the usage of graphics processing units 
(GPU). The tremendous reduction in computational time due to the GPU allows us to perform 
a systematic study of the energy-transfer efficiency in the Fenna-Matthews-Olson (FMO) light- 
harvesting complex at physiological temperature under full consideration of memory-effects. We 
find that approximative methods differ qualitatively and quantitatively from the HEOM results and 
. discuss the importance of finite temperature to achieve high energy-transfer efficiencies. 
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I. INTRODUCTION 



Light-harvesting complexes (LHC) are pigment protein-complexes that act as the functional units of photosynthetic 
ps) . systems, capable of absorbing the energy of a photon and transferring it towards the reaction center where it is 
^ ' converted into chemical energy usable for the cell. The transfer of energy in such systems is described by electronic 
04 ! exciton-dynamics coupled to the vibrations and other mechanical modes of the complex ij. Laser spectroscopy shows 
00 ■ quantum coherent effects in the energy transfer in LHC at temperatures up to 300 K [2 -[j] . 

\ Theoretical studies of model Hamiltonians at different levels of approximation [5l-[ll| show that the interplay between 
. coherent transport and dissipation leads to high efficiencies in the energy transport in these systems. LHC provide a 
' remarkable example of systems where noise or dissipation aids the transport. Understanding these systems is relevant 
' } , as it gives insight into the optimal design of artificial systems such as novel nanofabricated structures for quantum 
1*"^ ' transport or optimized solar cells. 

. . , The modelling of LHC is challenging due to the lack of atomistic ab-initio methods and requires to resort to effective 
^ ' descriptions. This is most apparent in the treatment of the vibrational excitations, which are commonly described 
by a structureless mode distribution. Then the energy transfer is calculated by the time propagation of a density 
• matrix, which couples the electronic exciton dynamics to the vibrational environment. For LHC, the rearrangement 
d ' of the molecular states after the absorption of the photon has to be taken into account and is described by the 
reorganization energy. The hierarchical equations of motion (HEOM) p^ - [l^ for the time evolution of the density- 
matrix were adapted by Ishizaki and Fleming [l5j to include the reorganization process in the transport equations 
and is exact within the model of exciton dynamics coupled to a bath with a Drude-Lorentz spectral density. 

In principle the HEOM can be extended to other spectral densities by using a superposition of Drude-Lorentz peaks 
[l7| . Previous calculations for the energy-transfer efficiency of the FMO complex did not consider memory effects 
and used a weak coupling perturbation theory 0, Other models try to get around these limitations by using 
the generalized Bloch-Rcdficld equations Q , but yield different results compared to the HEOM solution of the same 
model-system. Prolonged coherent dynamics is predicted due to the slow dissipation of reorganization energy to the 
vibrational environment [l8| . Theoretical descriptions must go beyond the rotating- wave approximation, perturbation 
theory, and require a full incorporation of time non-local effects, and physiological temperature. The HEOM fulfill 
all these premises. 
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To date, only the exciton population-dynamics for the FMO model has been studied within the full hierarchical 
approach [l^, [13 whereas the calculation of efficiency or 2D absorption spectra have been considered out-of-range for 
present computational power, since they require stable algorithms to propagate enlarged system matrices over many 
more time-steps. The adverse computational scaling of the HEOM stems from the need to propagate a complete 
hierarchy of coupled auxiliary equations, which need to be simultaneously accessed in memory and propa gate d in 
time. The insufficient computational power and memory-transfer bandwidth of conventional CPU clusters [20j has 
limited the application of the HEOM to study e nergy-t ransfer efficiency in small dimer systems, where other methods 
arc available for comparison around T = K 2l|-|24|. The advent of high-performance graphics processing units 



(GPU) with several hundred stream-processors working in parallel and with a high-bandwidth memory has lead us to 
perform the full HEOM approach for the exciton model of LHC. The efficiency calculations for the FMO system in the 
strong coupling regime require to propagate 240000 auxiliary matrices up to 50 ps (corresponding to 20000 time steps). 
The full HEOM approach takes only hours of computational time on a single GPU, whereas the corresponding CPU 
calculation would run several weeks and becomes completely unfeasible for bigger LHC due to the large communication 
overhead. We use the GPU algorithmic advance to characterize the exciton energy-transfer efficiency in LHC for a wide 
range of reorganization energies under full consideration of the memory-effects and at T = 300 K. Our calculations 
reveal several important mechanisms which are not contained within the approximative methods. The GPU-HEOM 
method opens the window to a wide-spread utilization of the HEOM, including the calculation of two-dimensional 
non-linear spectra of LHC as we will discuss elsewhere. Also the implementation of a scaled version of the HEOM 
[l^ . which reduces the number of auxiliary matrices, could be achieved on a GPU and reduces the computational 
effort of hierarchical methods further. 

For the development of new theoretical chemistry and physics algorithms, GPU are important devices and consid- 
erably enlarge the class of solvable problems if one manages to devise a program code which takes full advantage of 
the GPU stream-processing architecture. For interacting many-body systems, this cannot be generally achieved by 
|)OT tin^ an existing program to the GPU, but requires to follow the vector-programming paradigm from the onset 

The manuscript is organized as follows: in Sect. we set up the model for energy transfer to the reaction center 
in the FMO complex. In Sect. IHII we calculate the key-quantities used to characterize the energy flow, namely 
the efficiency and the transfer time to the reaction center. We compute them for a wide range of reorganization 
energies and bath correlation-times within the hierarchical approach. This section contains a detailed discussion of 
the differences of the HEOM results compared to calculations based on approximative methods. Wc highlight the main 
mechanism behind the high efficiency, the delicate balance between the requirements of an energy gradient towards 
the reaction center and the detuning of the energies, as shown in Sect. IIVI In Sect. |V] wc discuss how the transport 
efficiency is optimized with respect to physiological temperature and comment on the thcrmalization properties of 
the HEOM. Finally we summarize our findings in Sect. I VII Throughout the article, we provide detailed information 
about the computational times and requirements and collect in the appendices additional detailed information about 
the algorithms used and our GPU implementation. 



II. MODEL 



The FMO protein is part of the light harvesting complex that appears in green sulfur bacteria. Its structure has 
been widely studied both with X-ray and optical spectroscopic techniques [27h29| . It has a trimer structure, with each 
of the monomers consisting of seven bacteriochlorophyll (BChl) pigment molecules, which are electronically excited 
when the energy flows from the antenna to the reaction center. An ab-initio calculation of the energy-transfer process 
within an atomistic model is far beyond present computational capabilities. Instead one has to develop effective model 
Hamiltonians such as the widely used excitonic Frenkel-Hamiltonian [T|, [s^, HH . Within the Frenkel model, which 
assumes that excitations enter the system one at a time, the seven BChl pigments of the FMO complex are treated 
as individual sites which are coupled to each other and also to the protein environment. The excitonic Hamiltonian 
is given by 

N 

T^cx = So|0)(0|+ ^(£0„ + A,„)|m)(m| 

— 1 

+ Y, '^^n{\m){n\ + \n){m\), (1) 

m>n 

where N = 7, \m) corresponds to an electronic excitation of the chromophore BChlm and |0) denotes the electronic 
ground state of the pigment protein complex where we fix the zero of energy Eq ~ 0. The site energies £,„ = e°j + Am of 
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FIG. 1: Sketch of the exciton energies of the FMO complex (|l]), the reaction center, and the ground state. Each site, designated 
with a number, represents a BChl pigment of the FMO complex. The arrows indicate the dominant inter-site couplings. The 
excitation enters the FMO complex through the chlorosome antenna located close to sites 1 and 6. The incoming excitation, 
depicted with wavy arrows pointing upwards, follows two energy pathways to the reaction center. Wavy arrows pointing 
downwards indicate radiative loss-channels leading to the electronic ground state. In addition, each site is coupled to a phonon 
bath which accounts for the protein environment surrounding the pigments. 



the chromophores consist of the "zero-phonon energies" e'^ and a reorganization energy Am , which takes into account 
the rearrangement of the complex during excitation due to the phonon bath[l[ 

N 

Hrcoig = ^ A„i|m)(TO|. (2) 

771—1 

In the following we will consider identical couplings for all sites, A„i = A. 

The inter site couplings J^n are obtained by fits to experimentally measured absorption spectra [29j . In this 
contribution we use the designations and parameters of Ref. [32j . table 4 (trimer column) and table 1 (column 
4), summarized in H] A sketch of the dominant couplings is shown in [TJ The protein environment surrounding 
the pigments is modeled as identical featureless spectral bath densities coupled to each BChl. For simplicity, we 
neglect correlations between the baths. The electronic excitations at each site couple linearly with strength di to the 
vibrational phonon modes fe| of frequency uii. The coupling Hamiltonian is given by 

■Hex-phon = ^ i'^hLOidi{bi + b]) \ \m){m\, (3) 

771=1 \ i / ,„ 

where we assume identical baths at every site. Note that the reorganization energy is related to the coupling by 
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TABLE I: Exciton Hamiltonian in the site basis in (cm ^). Bold font shows the dominant couplings and site energies. Values 
taken from Ref. [3^ . 
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We model the losses due to radiative decay from the exciton to the electronic ground state |0) introducing a dipole 
coupling to an effective radiation photon field aj, 

N 

Hex-phot = E(°- + «^)^™ + i"^)(oi) ' (4) 

m— 1 v 

which results in a finite hfe-time for the exciton. The reaction center (RC) is treated as a population-trapping state 

Ut.^p^ERc\RC){RC\ (5) 

and enlarges the system Hamiltonian to a 9 x 9 matrix. Adolphs and Renger [s^ suggest that pigments 3 and 4, which 
have the largest overlap with the energetically lowest exciton-state, couple to the reaction center. Recent experimental 
evidence shows that pigment 3 is orientated towards the reaction center |33|. In addition it has been proposed that 
an 8th pigment may play a role in the initial stages of the energy transfer [34|. Here, we include the reaction center 
by introducing leakage rates from pigments 3 and 4 to the reaction center, which acts as a population trapping state. 
Thus the coupling term to the reaction center reads 

4 

-«cx-RC = E E(«'" + «t)M^'c {\RC){m\ + \m){RC\) (6) 

m— 3 v' 

where the sum runs over the photon modes at the reaction center. As shown in Sect.|Xl Eqs. (|A7IA8p . the coupling 
can be expressed in terms of a trapping rate T^q, and similarly for the radiative decay in 2] with the rate Tphof The 
total Hamiltonian of the system is thus given by 

H Hex ^" Htrap ^" Hex— phon ^ Hex— phot 

+Hex-RC + Hphon + Hp^ot + Hpjfot' C'') 

where "Hphon = Z^i,™!^"^''^!^*)™, H°hot = J2u,rni^'^"'l(^'^)rn, and H^hot = I],.',m=3,4(^^''^^«^')m- The time evolution 
of the total density operator R{t) is described by the Liouville equation 

±Rit)^-^[n, Kit)]. (8) 

We assume that at initial time t = the total density operator factorizes in system and bath components 

i?(t = 0) = pit = 0) ® pphon ® Pphot ® Pphot, (9) 

while at later times the system and the bath get entangled. Since we are only interested in the exciton dynamics, we 
trace out the degrees of freedom of the phonon and photon environments a = {phon, phot*^, phot'^'^} and propagate 
the reduced 9x9 density matrix in the Schrodinger picture 

pit) = Tr„(e-'^(^"+^=''-p''°»+^™-p''°'+^<='=-«'=+^''''"'^i?(0)) (10) 

for the exciton system {|m)}„i=i^...^7, the ground electronic state |0), and the reaction center |RC). 

Eq. ([TU)) is obtained by formal integration of the Liouville equation ([5]). The operator Cq = [Hcx+Htrap, •] represents 
the coherent dynamics and >Cex-phon accounts for dephasing and energy relaxation due to vibrations induced by the 
interaction with the protein environment, while the recombination and energy trapping are expressed by £ex-phot and 
'Cex-RC: respectively. The parts describing the different baths are summarized in £bath = [Hphon + H^j^^j + H^^j^^, •]. 
The coupling to the phonon and photon baths can be studied with different degrees of approximation. 

We calculate the energy flow within a hybrid formulation which treats the exciton dynamics and the vibrational 
environment within the HEOM and the trapping to the reaction center and the radiative decay within a Markov 
model. The Markovian treatment of the photon modes is justified as it occurs in a very different time scale and no 
backward energy flow to the system is allowed. We abbreviate our model by ME-HEOM, see Sect. [X] We solve the 
hierarchical equations using GPUs, which are ideally suited for this task and lead to huge speed-ups of the algorithm. 
Details of the computational implementation are collected in Sect. [b1 
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III. TRAPPING TIME FOR DIFFERENT REORGANIZATION ENERGIES 

The strong coupling of the excitonic system to the vibrational environment, which is of the same order as the 
excitonic energy differences (100 cm~^), requires a detailed treatment of the phonon bath over the time-scale of the 
correlations present in the system. The coupling i s q uantified by the parameter 7 Eq. (jA17l) . ranging from (35- 
166 fs)~^ for models of light-harvesting complexes [l^l- We calculate the efficiency of the energy transfer from an 
initially excited site to the reaction center using the hierarchical equations (|A26IA27p . The efficiency 77 is defined as 
the population of the reaction center at long times 

T]^ {RC\p{t^ oo)\RC). (11) 

For the FMO complex, two sites are located near the light-absorbing antenna [32|. We consider initial excitations at 
either site 1 or 6, which give rise to two energy pathways to the reaction center. One pathway starts from site 1 and 
transfers energy via site 2 to site 3, and the second pathway starts from site 6 and the energy flows via site 7 or 5 to 
site 4, see[Il 

We fix the upper limit of time propagation at imax, defined such that the remaining population in the system, 
excluding the ground-state and reaction center, has dropped from initially 1 to 10~^. To our knowledge, no solid 
experimental data exists for the coupling strength in eq. ([6]) , given in terms of the trapping rate Frc of sites 3 or 4 
to the reaction center. In the following we assume values of Fj^^ = 2.5 ps and = 250 ps, which are of the same 

order of magnitude as in other theoretical studies 0, S [HI . The actual time scale of the energy trapping is quantified 
by the trapping time 

{t) = l^"^^^dt' t' (A(i?C|p(t)|i?C))^^^„ (12) 

where we replace the upper limit of the integral by tmax- The trapping time depends strongly on the reorganization 
energy as shown in [2] For reorganization energies A < 50 cm"-'^ the coupling to the environment assists the transport 
and the trapping time decreases when A increases. 

Evaluating the equations of motion (|A26IA27[) in the ME-HEOM approach requires to truncate the hierarchy at 
-^max, which has to be large enough to reach convergence. In[2]we adjust the truncation such that the trapping times 
for iVmax = A^ and A^max = A^ + 1 differ at most by 0.02 ps. The required truncation increases with reorganization 
energy and for A = 300 cm~^ wc need A^max = 16 where we have to propagate 245157 auxihary matrices over 22000 
time steps (Ai=2.5 fs) leading to a GPU computation time of 3.7 hours. On a standard CPU the same calculation 
takes more than one month and a systematic study of parameters is not feasible. 

In the upper panel of [2] we compare the AIE-HEOM result with the secular Redfield theory, which employs the 
time-local Born-Markov approximation in combination with the rotating-wave approximation. For stronger values 
of the coupling, the hierarchical approach strongly deviates from the plateau obtained within the secular Redfield 
theory, which assumes a fast decay of the phonon bath. The secular Redfield limit (see Sect.|X| refiects, as expected, 
the qualitative behavior only for small reorganization energies and overestimates the energy transfer to the reaction 
center for A > 10 cm~^. 

An interesting question is the existence of an optimal value for the coupling A and the bath correlation-rate 7, for 
which the trapping time is minimized (and the efficiency maximized). Secular and full Redfield do not yield a local 
minimum of the trapping time, and thus no corresponding optimal A. Introducing the bath-correlations and memory 
effects by the parameter 7 in the ME-HEOM gives rise to a local minimum and an optimal value of A, as shown in [51 
In addition an optimal value of 7 emerges around 7^^ = 25 — 35 fs. For a small value 7"^ = 5 fs, the theory predicts 
a rapid loss of efficiency. 

The lower panel of [2] details the changes of the trapping time for the two different pathways of the energy flow in the 
FMO complex as function of the reorganization energy. The optimal reorganization energy for an initial excitation of 
site 1 is given by A^p^ — 55 cm~^ ((i)opt — 6.0 ps), while for an initial excitation of site 6 wc obtain A^p^ = 85 cm^^ 
((t)6p,=5.4ps). 

Optimal values of trapping times have been calculated within the generalized Bloch-Rcdfleld (GBR) approximation 
0. Using the same parameters, couplings, and Hamiltonian as in Ref. @, the ME-HEOM yield qualitative and 
quantitative differences with a 0.9 ps longer trapping time for an initial excitation of site 1. For an initial excitation 
located at site 6 the ME-HEOM and GBR results for the trapping time differ by 0.2 ps. 

IV. EFFICIENCY FOR REARRANGED ENERGY LEVELS 

In this section we study the relevance of the spacings of the energy levels in the FMO complex to see if the experi- 
mentally obtained energy levels (|I]) are close to an optimal value with respect to transport efficiency at physiological 
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FIG. 2: Trapping time from [12] as function of reorganization energy A at temperature T = 300 K. Trapping rate to BChl 3 
and 4 Fj^^ = 2.5 ps and Fp^J^j. = 250 ps. Upper panel: secular Redfield result with = 166 fs and the ME-HEOM results 
for three different bath correlation times 7"^ = 166 fs, 7^^ = 50 fs, 7^^ = 5 fs. The excitation enters at site 1. Lower panel: 
Comparison of the trapping times for the two possible pathways in the FMO when the energy is entering the complex starting 
from site 1, or at site 6 for a bath correlation time of 7"^ — 166 fs. 



temperature. 

The isolated excitonic system shows coherent oscillations of energy between the initially populated site and the 
delocalized excitonic states. Coupling to the environment gives rise to several mechanisms leading to a non-reversible 
energy transfer. In the simplest Haken-Strobl model, only dephasing is incorporated 0, HEIj but the temperature is 
fixed at T = cxd. Only by adjusting the dephasing rate, temperature effects can be included on a rudimentary level. The 
ME-HEOM approach enables us to calculate the transport at physiological temperature (T = 300 K) and brings into 
the picture another crucial mechanism to achieve highly efficient energy transfer. Namely, the temperature dependent 
stationary site populations. Since the system is in contact with a thermal environment at finite temperattrre, there 
is energy dissipation and the system relaxes to thermal equilibrium. This process guides the excitons to the lowest 
energy states (for the FMO complex within a few picoseconds) and is not contained in pure dephasing models. 

For a small coupling A and under the assumption that the system and bath degrees of freedom factggze, the thermal 





FIG. 3: Upper panel: Energy transfer efficiency ry m[TT]as function of temperature and site-energy shifts £3/4 — ^ £3/4 + AS. 
ME-HEOM parameters: A = 35 cm~^, 7"^ = 166 fs, = 250 ps and Fj^^ = 2.5 ps. The hierarchy is truncated at A''max = 8. 

Lower panel: energy-level shifts considered in the parameter range of the left panel. 



state of the system is given by the Gibbs measure 

Pthcrmai = e-'^^^ /Tr 6"^^- , /? = l/ikeT), (13) 

which populates the eigenstates of Hex according to the Boltzmann statistics. Stronger couplings lead to deviations 
from the Boltzmann statistics (36l |. Since the coupling to the reaction center, where the system deposits its excitation, 
is linked to sites 3 and 4, the efficiency depends strongly on the population and actual site-energies 3 and 4. To study 
this relation, we shift levels £3/4 — ;> £3/4 -t- AE and compute the efficiency of the energy transfer. [3] shows the efficiency 
evaluated with the ME-HEOM. We observe an almost symmetric behavior of the efficiency for positive and negative 
energy shifts, with slightly higher efficiencies towards negative energy shifts. 

A shift to lower energies increases the energy gradient in the FMO as the thermal state prefers to populate the 
low- lying sites. This mechanism improves the transfer efficiency but shifts the two sites out of resonance and they get 
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decoupled from the other levels of the FMO. Thus coherent transport becomes more difficult and the energy transfer 
to the reaction center is expected to slow down. Similar arguments hold when the energies £3 and £4 are shifted to 
higher energies. On the one hand AE > brings the sites 3 and 4 closer to resonance and increases the coupling to 
the remaining sites, thus enhancing coherent transport. On the other hand the thermal state gets delocalized over all 
sites of the FMO complex and there is no special preference to populate site 3 and site 4. In such case the FMO loses 
its property to act as an energy funnel and environment assisted transport to the reaction center is hindered. 

[3]shows how the delicate interplay between coherent delocalization and energy dissipation towards the reaction center 
gives rise to an optimal arrangement of site energies. We obtain maximal efficiency around AE — corresponding to 
the original parameters in|T]and the optimum value is robust against small variations in the site energies. 

V. TRAPPING TIME FOR DIFFERENT TEMPERATURES 

As discussed in the previous section, the environment assists the transport towards the thermal equilibrium state. 
In the FMO complex, the sites 3 and 4 are coupled to the reaction center and present the lowest exciton energies in the 
system (see [I}, thus the energy dissipation in the phonon environment enhances the population of those sites and hence 
the efficiency. With increasing temperature one might expect high transfer efficiencies because thermalisation occurs 
on a faster time scale. However, with increasing temperature higher energy states have a higher thermal-equilibrium 
population and thus the transport efficiency towards the reaction center decreases. 

These two competing mechanisms result in an optimal temperature with maximal efficiency. Both mechanism 
are already present in the secular Redfield limit, and the optimal energy transfer is obtained around 75 K, see 
mja). Our ME-HEOM calculations predict optimal efficiency at slightly lower temperature 70 K, but this value is 
outside the range where our high-temperature implementation is supposed to work (see Sect. EJ. We obtain a steep 
increase of the trapping time for low temperatures shown in Ufa), which is also reflected in the efficiencies SKb). This 
increase in trapping time and decrease in efficiency is not present in the secular Redfield approach, which saturates 
for T — > 0. Although we take into account the lowest-order quantum correction to the Boltzmann statistics [l^, at 
low temperatures more correction terms are required. One criteria to validate the HEOM is to check if the stationary 
state p{t 00) of the population dynamics of the isolated FMO, which is decoupled from the reaction center and 
radiative decay, approaches the thermal state. As is shown in|4l[c), the HEOM high-temperature implementation fails 
to approach the thermal state for temperatures below 100 K, where the HEOM predict an unphysical steep decent 
of population at low energy site 3 and hence transfer efficiency is underestimated. For temperatures above 100 K the 
high temperature limit agrees very well with the thermal state and the ME-HEOM results are reliable. Comparing 
our ME-HEOM results above 100 K to the secular Redfield ones shown in|4l[a) and (b) we conclude that the Redfield 
approach, which is known to be valid in the weak coupling limit only, overestimates the efficiency and underestimates 
the trapping time. 

VI. CONCLUSIONS 

We have shown that the HEOM are computationally feasible for calculating the energy transfer for large systems 
following our GPU implementation. This algorithmic advance allowed us to calculate the efficiency and trapping 
time of the energy transfer in the FMO complex for a wide range of parameters. The results point to qualitative 
and quantitative deficiencies of approximative methods and show that an accurate treatment of memory effects and 
reorganization processes in the system-bath coupling of LHC is needed to evaluate the precise role of temperature, 
exciton energy-differences, the coupling strength, and the time correlations in the bath. The ME-HEOM yield longer 
trapping times and indicate the importance of memory effects and correlations in order to maximize the efficiency in 
the FMO complex at physiological temperature. Interestingly, the zero-shift energies of the FMO complex provide an 
almost optimal arrangement for funneling the energy flow to the reaction center at T = 300 K. Beyond the results for 
the FMO complex, our fast computational GPU-algorithm for the HEOM provides a robust and scalable way to treat 
bigger systems and allows us to calculate two-dimensional spectra of LHC, which requires to enlarge the dimension 
of the density matrix by taking into account double-excitonic states. 
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FIG. 4: Energy transfer as a function of temperature for the secular Redfield approximation and the exact ME-HEOM calcu- 
lation with = 166 fs, FpjJ^j. — 250 ps, Fj^^ — 2.5 ps and truncation A^max ~ 8. Both approaches use a reorganization energy 
of A =35 cm~^ and start with initial population at site 1. (a) Trapping time as a function of temperature, (b) Efficiency as a 
function of temperature, (c) Population of site 3 for different temperatures in the Boltzmann thermal equilibrium state pthormai 
and for the isolated FMO (decoupled from the reaction center and the radiative decay) using the ME-HEOM, p{t — >■ oo). Note 
that the ME-HEOM needs further corrections at temperatures below 100 K in order to reach the thermal state. 
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Appendix A: Hybrid Markov-HEOM approach 
1. Secular Redfield approximation 

We follow Ref. [131 ^^i^ expand the phonon part £phon up to second order in the exciton phonon coupling. We use 
the Born-Markov and secular approximation to obtain 

- ^£ox-phon = ^iim(w)[l/J,(w)Kn(w),»] 

7n,uj 

+ Y,hMv{vM), (Ai) 

where Vjn ~ |m)(m| stands for the exciton operators and 'D{V)p = VpV^ — ^V^Vp — ^pV'^V. The Lamb shift reads 

L„,{cj)=lm dt'e-*'"*'(w„(t')u™(0)}phon, (A2) 
Jo 

and the decoherence rates are given by 

poo 

7™(w)=2Re/ dt' e-'"'' {u,n{t')u,n{0))pi,on, (A3) 
Jo 

with the phonon operators Um.phon = ( ^'^idi{bi + ^1)),„- The exciton operators 

K^M- c:^{M)c^{N)\M){N\6{u:-Em + En) (A4) 

are evaluated in the excitonic eigenbasis |M) = 'Yl,m^in{M)\m) with l-La-x\M) = Em\M). For simplicity we assume 
that the phonon environments of the individual chromophores are uncorrelated. We additionally neglect the Lamb- 
type renormalization term. For the explicit evaluation of the decoherence rates, we quantify the strength of the 
exciton-phonon coupling and introduce a Drudc-Lorentz spectral density 

J„M = 2A™^^. (A5) 



The decoherence rates are then given by 



27rJ(-a;)(n(-a;) + 1), if w < 



7H = <; 27rY^^^, ifc. = , (A6) 

2-11 J {uj)n{Lj) , if w > 

where n{oj) = {exp{huj/kBT) — 1)^^ corresponds to the phonon statistics. Note that we neglect the index m and use 
the same parameters for all sites. 

We describe the radiative decay and trapping to the reaction center by a Lindblad ansatz 

N 

- ^/:cx-phot = rphot2?(|0)(m|), (A7) 

m— 1 

and 

4 

- ^/:cx-RC = Y ^Rc'D{\RC){m\), (A8) 

m— 3 

with identical trapping rates for all sites Fphot = 27r|/ip and Frc = 27r|/iijc|^, respectively. IA7l and R8l are derived by a 
master equation approach employing the rotating wave approximation for Hamiltonians Hpij^t and "H^^^^ respectively. 
The strength of the cxciton-photon coupling is defined by a constant spectral density Jij-o) — 2tt. We further assume 
that the photon field cannot create excitations. The same holds for the trapping to the reaction center, as no backward 
energy flow to the system is allowed. This is equivalent to previous works, where trapping and exciton recombination 
are included in the Hamiltonian in the form of anti-Hermitian parts [?[ . 
The equation of motion for the density operator in secular Redfield form reads 

^P(^) = (-^0 + -^cx-phon + -^cx-phot + -^cx-Rc) Pit)- (A9) 
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2. Combined master-equation-HEOM approach (ME-HEOM) 



For the FMO complex, the couphng of the excitons to the phonon environment is large and the reorganization 
process occurs on a time scale which is comparable to the system dynamics. The secular Rcdfield approximation 
is not valid for the coupling to the phonon bath and a non-perturbative treatment is required. We will follow the 
derivation in Ref. [15| and introduce a set of hierarchically coupled equations. For trapping-time and efhciency 



calculations we introduce slight modifications and in particular we include the coupling to the reaction center and 
the radiative decay. We derive a combined ME-HEOM approach which treats the exciton-phonon coupling exactly, 
whereas the leakage to the reaction center and exciton ground state is described in the Born-Markov limit. 

We start with the Liouville equation for the total density operator |8] and assume that the total density operator 
factorizes [9l In the interaction picture with 



Ho — He 



H 



trap 



^pho 



phot 



RC 
phot ' 



where we denote operators with 



the Liouville equation reads 



(AlO) 



(All) 



dt 



m = 



^ [^cx— phon 



H cx— phot 
( -^cx — phon H" -^cx— phot 



(A12) 



After formal integration and tracing out the bath degrees of freedom a = {phon, phot", phot^'-'} we get a formal 
solution for the reduced density operator describing the exciton degrees of freedom 



with time evolution operator 



XPphon®p|5hot®Pphot)- 



(A13) 



(A14) 



We make use of the Gaussian nature of the harmonic baths to reduce the bath expectation values to two-time 
correlation functions. Hence the influence of the environment is characterized by the symmetrized correlation 



(A15) 



and the response function 



(A16) 



where Mm, phot — l^mi'^f + We assume that each site is coupled to an independent phonon bath and that there 
are no correlations between the radiative decay and trapping at different sites. For the exciton-phonon coupling we 
employ a Drude-Lorentz spectral density 



^ +74 



and obtain, in the high temperature limit 



2Xr. 



kbT 

Xm{t) = 2A„7„,e~''™*. 



(A17) 

(A18) 
(A19) 



The parameter 7^ describes the time scale of correlations in the vibrational environment of the protein. Note that as 
we consider identical couplings for all sites, the notation is simplified in the main text and the subindex m is removed 
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from the time correlation scale of the bath 7. 
The time evolution operator becomes 



U{t) = T+ I] e/o c/o ^"-.p'-" e/o >^™,Pi.otac (.) ^^30) 

m— 1 m— 1 m— 3 

with 

ft Jo ^ 

We denote the commutation relations by O'^ / = [O, /] and 0°/ = [O, /] + . The time evolution of the reduced density 
matrix is given by 

^7 7 4 

-P{t) = T+(^Y1 W^™.phon(0 + 5Z W^-,photo(i) + 51 W^™.photRc(0)p(t). (A22) 



ni—3 



Note that due to the time ordering operator affecting the integration in Eqs. (|A21IA22[) is time non-local. In the 
following we treat the exciton-photon and exciton-reaction center couplings in the Born-Markov limit. That is, the time 
non-local operators r+ J2m=i ^m,photo{t) and r+ J2m=3 l^m, photic (i) are replaced by their time-local Born-Markov 
limit £ex-phot and £ex-RC defined in Eqs. (|A7IA8[) . respectively. Eq. (|A22[) finally reduces to 

d i i ^ 

-pit) = -^Cphotm - ^^Rcm +T+Y^ W„,,phon{t)pit)- (A23) 

m— 1 

We define auxiliary operators 



~ini,...,m) 



m,k,l 

with 



^ kBTh^ 

= (A25) 



and rewrite the time non-local effects into hierarchically coupled equations of motion 

= ^ {^^^ + '^phot + ^ncjpit) 

+ Ei^™,Phon^'""-'"'"^''-'"^'W (A26) 

m 

with 

d_ ^{ni,...,m) 



dt 



it) = (A27) 
= [ - ^{Cc. + Cpuot + Cnc) + En„7„]a("i'-'"^)(i) 

m 

+ Ei^-phon'^^""--"""''--'"^ni) 

m 

where we again have used the Born-Markov limit for the trapping and radiative decay. The auxiliary operators keep 
track of the memory effects of the bath and account for the removal of the reorganization energy. The cr-matrices 
are initially set to zero. For a sufficiently large A^^ax — X^m"-"" diagonal coupling in Eq. (jA27|) becomes the 
dominant term and we can truncate the hierarchy. 
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max 



#g-matr. CPU GPU speed up GPU utilization 
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10 



4 



6 



8 



330 120 s Is xl20 22% 

1716 676 s 3 s x225 56% 

6 435 2 636 s 7 s x376 82% 

19 448 8 275 s 19 s x435 93% 

50 388 21972 s 48 s x458 97% 



TABLE II: Comparison of CPU and GPU computation time of the population dynamics of the isolated FMO complex. We 
propagate 1000 time steps, the GPU (NVIDIA C2050) calculation are performed in single precision. Double precision (not 
required here for converged results) increases the GPU computation time by a factor of two. 



Appendix B: Algorithm for implementing the hierarchical method on graphics processing units 

For the large reorganization energies typically found in LHC one needs to go beyond the Born-Markov approach and 
to consider non-local temporal effects. We do this by solving the system dynamics within the hierarchical approach 
shown in the previous section. The method requires considerable memory and computational efforts and a large 
number of auxiliary matrices is needed to store the time non-local bath properties. Since all auxiliary matrices 
have to be accessed to perform the next propagation step, the huge communication overhead renders conventional 
parallelization schemes, where distributed computing nodes are connected by Ethernet [23|, ineffective. GPUs have 
the twofold advantage of a fast memory bandwidth and the availability of several hundred stream processors. By 
assigning one stream-processor to each auxiliary matrix we obtain a speedup of the hierarchical method by the number 
of processors. The numerical calculations in this manuscript are performed on a NVIDIA Fermi C2050 GPU with 448 
processors (1.15 GHz) and 3 gigabytes of ECC-protected on-board memory. The first step of the algorithm initializes 
the system of the cr-matrices of the hierarchy. With increasing truncation TVmax, the total number of cr-matrices grows 
factorially A^tot = {N + A^max)!/(-^'-^max!)i where A^ corresponds to the number of sites [l^. As shown in|ITl the 
calculation of a population dynamics of the FMO complex with N = 7, N^a.^ = 12 requires already 50 388 matrices 
whereas 330 matrices are sufficient for a truncation at A^max = 4. The memory of the cr-matrices is allocated on the 
graphics-board and initialized to zero. It is not necessary to transfer the cr-matrices to the main-processor memory 
at any time during the calculation. The only memory transfer between CPU and GPU involves the N x N entries 
of the reduced density operator p. To advance the propagation one time-step in eq. (|A27|) requires a large number 
of matrix multiplications. Each single cr-matrix is connected to 2N neighbors, these connections are stored in GPU 
memory in a linked-list. The GPU uses 448 cores in parallel with fast GPU memory transfer and thus provides an 
immense reduction of the computation time up to a factor of 458 for the matrix multiplications. For benchmarking 
the algorithm, we propagate 1000 time steps using a 4th order Runge-Kutta integrator. For the final output into files 
requires a short memory transfer from the GPU to the GPU. 

In|TT]we summarize the computational speed-up of the G2050-GPU compared to a standard CPU (Intel 2.40GIIz). 
The GPU computation is performed using single precision, which yields sufficient accuracy for the problem at hand. 
For the population dynamics of the FMO complex using A = 35 cm~^, 7"^ = 166 fs, temperature of 300 K. propagation 
time of 10 ps with step size At ~ 10 fs and truncation A^max = 12 the populations are accurate within single precision 
to six digits \plf^^^°{t) ~ pfi^^^" {t)\ < 5 x 10^^. This switch from single for double precision increases the computation 
time approximately by a factor of two on the C2050-GPU. 
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